AITopics

2403.00894

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-14-2023

LeafAI: query generator for clinical cohort discovery rivaling a human programmer

Dobbins, Nicholas J, Han, Bin, Zhou, Weipeng, Lan, Kristine, Kim, H. Nina, Harrington, Robert, Uzuner, Ozlem, Yetisgen, Meliha

Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.

criteria, information retrieval, machine learning, (20 more...)

doi: 10.1093/jamia/ocad149

2304.06203

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe (0.14)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceJun-1-2023

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Kou, Bonan, Chen, Shengmai, Wang, Zhijie, Ma, Lei, Zhang, Tianyi

Large Language Models (LLMs) have been demonstrated effective for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. To deepen our understanding, we investigate whether LLMs attend to the same parts of a natural language description as human programmers during code generation. An analysis of five LLMs on a popular benchmark, HumanEval, revealed a consistent misalignment between LLMs' and programmers' attention. Furthermore, we found that there is no correlation between the code generation accuracy of LLMs and their alignment with human programmers. Through a quantitative experiment and a user study, we confirmed that, among twelve different attention computation methods, attention computed by the perturbation-based method is most aligned with human attention and is constantly favored by human programmers. Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.

large language model, machine learning, natural language, (17 more...)

2306.0122

Country:

Europe > Austria > Vienna (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > District of Columbia > Washington (0.05)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-31-2023

Evaluating GPT's Programming Capability through CodeWars' Katas

Zhang, Zizhuo, Wen, Lian, Zhang, Shaoyang, Chen, David, Jiang, Yanfei

In the burgeoning field of artificial intelligence (AI), understanding the capabilities and limitations of programming-oriented models is crucial. This paper presents a novel evaluation of the programming proficiency of Generative Pretrained Transformer (GPT) models, specifically GPT-3.5 and GPT-4, against coding problems of varying difficulty levels drawn from Codewars. The experiments reveal a distinct boundary at the 3kyu level, beyond which these GPT models struggle to provide solutions. These findings led to the proposal of a measure for coding problem complexity that incorporates both problem difficulty and the time required for solution. The research emphasizes the need for validation and creative thinking capabilities in AI models to better emulate human problem-solving techniques. Future work aims to refine this proposed complexity measure, enhance AI models with these suggested capabilities, and develop an objective measure for programming problem difficulty. The results of this research offer invaluable insights for improving AI programming capabilities and advancing the frontier of AI problem-solving abilities.

gpt-3, kata, test case, (15 more...)

2306.01784

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Oceania > Australia > Queensland > Brisbane (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nascimento, Nathalia, Alencar, Paulo, Cowan, Donald

Comparing Software Developers with ChatGPT: An Empirical Investigation

arXiv.org Artificial IntelligenceMay-25-2023

The advent of automation in particular Software Engineering (SE) tasks has transitioned from theory to reality. Numerous scholarly articles have documented the successful application of Artificial Intelligence to address issues in areas such as project management, modeling, testing, and development. A recent innovation is the introduction of ChatGPT, an ML-infused chatbot, touted as a resource proficient in generating programming codes and formulating software testing strategies for developers and testers respectively. Although there is speculation that AI-based computation can increase productivity and even substitute software engineers in software development, there is currently a lack of empirical evidence to verify this. Moreover, despite the primary focus on enhancing the accuracy of AI systems, non-functional requirements including energy efficiency, vulnerability, fairness (i.e., human bias), and safety frequently receive insufficient attention. This paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration, enhancing the reliability of AI-based methods, and understanding task suitability for humans or AI. Furthermore, it facilitates the effective implementation of cooperative work structures and human-in-the-loop processes. This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The empirical study includes a case of assessing ChatGPT-generated code versus code produced by developers and uploaded in Leetcode.

large language model, machine learning, natural language, (20 more...)

2305.11837

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Denmark (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Software (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-10-2023

Humans are Still Better than ChatGPT: Case of the IEEEXtreme Competition

Koubaa, Anis, Qureshi, Basit, Ammar, Adel, Khan, Zahid, Boulila, Wadii, Ghouti, Lahouari

Since the release of ChatGPT, numerous studies have highlighted the remarkable performance of ChatGPT, which often rivals or even surpasses human capabilities in various tasks and domains. However, this paper presents a contrasting perspective by demonstrating an instance where human performance excels in typical tasks suited for ChatGPT, specifically in the domain of computer programming. We utilize the IEEExtreme Challenge competition as a benchmark, a prestigious, annual international programming contest encompassing a wide range of problems with different complexities. To conduct a thorough evaluation, we selected and executed a diverse set of 102 challenges, drawn from five distinct IEEExtreme editions, using three major programming languages: Python, Java, and C++. Our empirical analysis provides evidence that contrary to popular belief, human programmers maintain a competitive edge over ChatGPT in certain aspects of problem-solving within the programming context. In fact, we found that the average score obtained by ChatGPT on the set of IEEExtreme programming problems is 3.9 to 5.8 times lower than the average human score, depending on the programming language. This paper elaborates on these findings, offering critical insights into the limitations and potential areas of improvement for AI-based language models like ChatGPT.

large language model, machine learning, natural language, (18 more...)

2305.06934

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology > Telehealth (0.46)
Education > Curriculum (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-7-2023, 13:12:29 GMT

Can AI Excel in Programming Like Humans?

Artificial intelligence (AI) has come a long way since its inception, with advancements in machine learning and natural language processing allowing it to perform complex tasks. One area where AI has shown significant potential is in programming, with researchers exploring ways to teach machines to code. However, the question remains: Can AI excel in programming like humans? In this article, we will explore the possibilities and limitations of AI in programming. Before delving into whether AI can excel in programming like humans, it's essential to understand what AI in programming entails.

human programmer, limitation, replace human programmer, (3 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.98)

#artificialintelligenceMay-12-2022, 09:06:46 GMT

Will DeepMind's AlphaCode Replace Programmers? - KDnuggets

The Alphabet subsidiary DeepMind has done it again, and this time, they are testing the boundaries of AI in software development sectors. DeepMind's AlphaCode was tested against human performance on coding challenges and achieved rank among the top 54% of human coders on Codeforces. This is a remarkable achievement as it is one of its kind. There are other code generation machine learning models, such as OpenAI Codex, but none of them tried to compete with human programmers. A coding challenge is like solving puzzles. To solve these challenges, an individual must have an understanding of logic, math, and programming skills.

alphacode, deepmind, software development, (13 more...)

Industry: Information Technology > Software (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-8-2022, 19:10:14 GMT

DeepMind's AlphaCode Explained: Everything You Need to Know

Programming has been for a long time a high-status, high-demand skill. Companies and businesses across industries depend at a very foundational level on the ability of human developers: People who write and understand the language of computers. Recently, with the advent of large language models, AI companies have begun to explore the possibilities of systems that can learn to code. OpenAI's Codex -- embedded into GitHub Copilot -- was the first notable example. Codex can read simple natural language commands and instructions and write code that matches the intention of the user. Yet, writing small programs and solving easy tasks is "far from the full complexity of real-world programming." AI models like Codex lack the problem-solving skills that most programmers rely on in their day-to-day jobs. That's the gap DeepMind wanted to fill with AlphaCode, an AI system that has been trained to "understand" natural language, design algorithms to solve problems, and then implement them into code. AlphaCode displays a unique skillset of natural language understanding and problem-solving ability, combined with the statistical power characteristic of large language models. The system was tested against human programmers on the popular competitive programming platform Codeforces. AlphaCode averaged a ranking of 54.3% across 10 contests, which makes it the first AI to reach the level of human programmers in competitive programming contests. I've studied the AlphaCode paper to understand what AlphaCode is and isn't, what these impressive results mean, what are the implications, and what the future holds for AI and human developers. I've also researched what AI experts and competitive programmers are saying about AlphaCode, so you have different independent perspectives to form your own. This article is a thorough review divided into 6 sections (and their respective subsections). I will include comments throughout the article to explore some questions, ideas, and results in more depth.

alphacode, deepmind, programmer, (17 more...)

Country:

North America > United States > New York (0.04)
North America > Canada > Quebec (0.04)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-22-2022, 01:06:39 GMT

When DeepMind's 'AlphaCode' Competed Against Human Programmers

Among at least a few programmers, this has already provoked some concern. Recently a programming student on Hacker News complained of "AlphaCode Anxiety" (as well as worries about GitHub's Copilot). "Now it feels like I'm running against a clock until the career I am working very hard for will automate itself away," the student wrote. When a blog post at CodeForces declared "The future has arrived," one worried programmer even argued that "there is a limit to what humans should automate." The programmer added pointedly that the DeepMind developers who built AlphaCode "think that they are irreplaceable, but they would be the first ones to get replaced." But the fact that AlphaCode finished in the bottom half was also greeted with a very human disparagement. "AI is such a noob," the first commenter responded.

alphacode, deepmind, programmer, (15 more...)

Country: North America > Canada > Quebec > Montreal (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games > Chess (0.30)
Education > Educational Setting (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)